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ABSTRACT 

Lanchester's  equations  arecommonly  used  as  the  basisfor  force-on-force  combat  models,  even  if 
only  as  a  metamodel  for  a  more  complex  combat  simulation.  This  report  examines  whether 
attrition  is  adequately  modelled  by  such  Markov  processes.  It  shows  that  the  distribution  of 
historical  battle  casual  ties  is  consistent  with  that  obtained  when  attrition  is  modelled  as  an  Ito 
process.  The  additional  Wiener  term  can  be  regarded  as  representing  the  impact  of  the  wider 
environment  on  attrition  rates. 
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Application  of  Black  Scholes  Complexity  Concepts  to 

Combat  Modelling 


Executi  ve  Su  mmary 

Lanchester's  equations  are  commonly  used  as  the  basis  for  force-on -force  combat  models, 
even  if  only  as  a  metamodel  for  a  more  complex  combat  model.  These  equations  define  a 
systemwiththestrengthsoftheforcesinvolved  comprising  its  internal  parameters.  M  any 
systems  are  adequately  described  using  just  their  internal  parameters,  without 
consideration  of  any  interactions  between  that  system  and  its  wider  environment. 
H  owever,  it  is  apparent  from  the  work  on  extendi  ng  combat  models  based  on  Lanchester's 
equations  to  include  additional  parameters  such  as  morale,  spatial  force  dispersion  and 
movement,  that  such  quantities  do  affect  attrition  rates.  The  inclusion  of  additional 
parameters  also  results  in  additional  complexity  and  the  loss  of  insight  that  a  simple 
model  provides,  ideally  what  is  desired  is  a  means  to  include  the  effect  of  the  wider 
environment  on  attrition  rateswithout  also  increasing  the  model's  complexity. 

The  standard  model  for  the  behaviour  of  stock  prices  in  time  assumes  they  are  a 
continuous  M  arkov  process  with  a  constant  fractional  drift  rate.  The  Black  and  Scholes 
model  of  stock  prices  treats  price  volatility  as  resulting  from  the  action  of  the  rest  of  the 
market  on  the  system  comprised  of  the  one  stock  price  being  modelled.  Furthermore,  it 
does  not  attempt  to  model  the  processes  by  whi  ch  the  market  mi  ght  affect  the  stock  pri  ce, 
arguing  that  the  mechanisms  are  too  complex  to  model  or  are  not  known. 

Lanchester's  Equations  are  similar  to  the  starting  point  for  the  derivation  of  the  Black 
Scholes  Equation.  This  suggests  an  obvious  approach  for  including  theeffect  of  the  wider 
envi  ronment  i  n  the eval  uati  on  of  combat  attriti on  rates,  through  the addi ti  on  of  a  Wi ener 
process,  turning  Lanchester's  Markov  process  into  an  Ito  process. 

Thepresent  work  has  used  an  existi  ng  database  of  historical  battle  results  to  show  that  the 
frequency  distribution  of  battle  casualties  is  consistent  with  that  expected  when 
Lanchester's  equations  areaugmented  to  form  an  Ito  Process  rather  than  the  conventional 
Markov  Process.  The  additional  Wiener  term  can  be  regarded  as  representing  the  impact 
of  the  wider  environment  on  attrition  rates.  The  shape  of  the  casualty  frequency 
distribution  was  not  observed  in  the  initial  force  strength  distribution.  This  supports  the 
contention  that  such  distributions  result  from  the  attrition  process  itself  and  are  not 
artefacts  of  the  sampling  or  analysis  procedure. 

Thedatabase  used,  and  indeed  all  such  databases,  was  shown  to  include  an  inherent  bias 
which  under-represents  the  number  of  small  battles.  While  the  effect  of  such  bias  was 
observed,  by  i  ncorporati  ng  strata  sampl  i  ng  concepts  it  was  possi  bl  eto  confi  nethe  effects 
of  such  bias  into  a  singlestratum  and  a  small  number  of  data  points,  which  can  then  be 
allowed  for. 
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1.  Introduction 


A  model  of  combat  has  long  been  sought,  arguably  since  the  time  of  Sun  Tze  half  a 
millennium  before  the  Christian  era.  However,  prior  to  the  early  Twentieth  Century  all 
models  that  had  been  developed  were  essentially  descriptive  or  narrative  in  nature.  This 
changed  with  the  advent  of  Lanchester's  model  of  air  combat  in  1914  [1],  which  was  the 
first  attempt  to  construct  a  mathematical  model  of  combat.  Lanchester's  equations  have 
subsequently  been  used  as  the  basis  for  most  force-on -force  combat  models,  even  if  only  as  a 
metamodel  for  a  more  complex  combat  simulation  [2]  [3]. 

Lanchester's  mathematical  formulation  of  his  model  was  incomplete.  In  addition  to  his 
two  equations,  it  required  processes  to  initiate  combat,  commit  forces  and  allocate  effort, 
arrangement  of  forces  to  carry  out  those  decisions  to  facilitatea  hierarchical  decomposition 
of  a  battle  into  smaller  sub-battles  to  which  Lanchester's  Equations  are  then  applied. 
Extensive  research  to  develop  mathematical  formulations  to  replace  those  processes  has 
subsequently  taken  place  [4]. 

Combat  between  two  sides  of  strength  x(t)  and  y(t)  is  generally  described  by  simple 
differential  equations,  such  as  those  for  Lanchester's  "modern”  combat: 


dx  . . 

—  =  -ay(t), 

dt 

=  -bx(t), 
dt 


40)  =  *0 
v(0)  =  y0 


(1) 


which  may  be  modified  to  include  additional  combat  effects  [4],  or  additional  differential 
equations  are  sometimes  added  to  describe  the  interaction  of  other  parameters  [5].  Indeed, 
Taylor  [4]  has  expressed  the  belief  that  regardless  of  how  a  mathematical  model  of  combat 
is  constructed,  the  attrition  of  engaged  forces  should  be  expressible  in  terms  of  a  series  of 
coupled  differential  equations,  which  in  general  terms  may  bewritten  as: 


dxi 

dt 

dy  j 

dt 


fi(xl,...xk,y1,...yn,bl,...bm) 

gj(xl,...xk,yl,...yn,bl,...bm) 


(2) 


where  f( )  and  g()  are  arbitrary  functions  of  the  independent  variables  x,  and  yt  of  sides  X 
and  Y  as  well  as  of  the  dependent  parameters  b,. 

The  present  work  will  examine  the  application  of  a  common  technique  by  which  simple 
metamodels  are  developed  from  such  systems  of  equations,  before  considering  an 
additional  approach  for  including  the  impact  of  the  wider  environment.  Finally,  it  will 
compare  observed  casualty  patterns  in  available  historical  data  with  a  proposed  attrition 
model. 
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2.  Combat  Metamodels 

A  model  can  be  considered  as  a  representation  of  an  actual  situation  that  may  be  used  to 
better  understand  that  situation.  Complex  phenomena  often  require  complex  models  if  the 
model's  behaviour  is  to  reproduce  that  of  the  real  world.  However,  while  such  models 
produce  reasonable  agreement  with  real  world  results,  they  are  less  useful  in 
understanding  the  functional  dependence  of  the  modelled  quantities  on  the  input 
parameters.  In  such  cases  it  is  useful  to  develop  a  (simpler)  model  of  that  model  which, 
although  providing  lower  fidelity  results,  is  better  at  explaining  the  causes  of  those  results. 
Such  models  are  the  metamode/s. 

Metamodels  described  by  systems  of  equations  similar  to  those  above  (equation  2)  are 
often  developed  using  an  ad-hoc  unstructured  approach  such  as  dimensional  analysis.  In 
recent  years  a  systemati  c  treatment  of  the  process  for  the  dev  el  opment  of  such  metamodel  s 
has  emerged  (intermediate  asymptotic  approximation)  that  provides  a  degree  of  rigour  to 
the  undertaking  [6], 

While  the  interested  reader  is  referred  to  the  work  by  Barenblatt  [6]  for  a  comprehensive 
treatment  of  the  application  of  similarity  principles,  a  brief  summary  is  given  below.  It 
must  be  noted  that  the  use  of  dimensional  analysis  in  the  study  of  similarity  laws  is  only 
strictly  correct  for  self-similar  problems.  However,  previous  work  by  the  author  [3]  and 
others  [2]  provides  some  justification  in  the  application  of  this  approach  in  combat 
attrition  modelling. 

2.1  I  n termed i ate  A  sy m ptoti  cs 

The  relationship  between  an  outcome  of  the  model,  and  a  set  of  input  variables  can  be 
written  as: 

0  =  f(a1,...ak,bl,...bm)  (3) 

where  the  variables  ai,...ai<  have  independent  dimensions  (the  dimension  of  any  of  the  a's 
cannot  be  expressed  as  a  combination  of  the  dimensions  of  the  other  a's),  and  the 
dimensions  of  the  bi,...bm  can  be  expressed  in  terms  of  products  of  powers  of  the 
dimensions  of  ah...ak.  In  general  k  >0and  m  >0.  In  the  example  system  of  equation  2,  the 
model  outcomes  are  the  rates  of  change  of  the  strengths  of  the  forces  engaged. 

The  val  ues  of  the  ah...ak  can  be  i  ndependentl y  vari  ed,  so  that 

a\  =  Atat  (4) 

The  dimensions  of  a  and  bi,...bm  may  then  be  represented  as  power  monomials  in  the 
dimensions  of  ax,...ak ,  for  example: 


2 


M=tafy"-kP 

\°\=MP  ~{ak  ]' 
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(5) 


With  the  corresponding  transformation  of  values: 


b'J  =  Al«..Akr>bJ 
0'=  Ap  ...AkrO 


(6) 


I  ntroduci  ng  the  si  mi  I  arity  parameters: 


n'=^ 


n  = 


o 


(7) 


where  the  exponents  of  the  variables  are  chosen  so  that  the  parameters  n  and  rii„..nm  are 
dimensionless,  enables  equation  (3)  to  bere-written  as: 


n  = 


p  r 

a[  ...a k 


f(al,...ak,nia[ 


r,  t— r  p„.  r 


(8) 


Barenblatt  [6]  shows  that  this  in  turn  can  be  re-written  in  terms  of  a  function  of  a  smaller 
n  u  mber  of  d  i  mensi  on  I  ess  vari  abl  es,  I  ead  i  ng  to  the  rel  ati  onsh  i  p : 


0  =  ap 


[  ...a  k<p\ 


bi 


ap  i  a'  i 
VW1  ■••uk 


a,  ...a 


k  J 


(9) 


Furthermore,  self-similar  solutions  correspond  to  cases  where  the  values  of  the  variables 
bi,...bm  tend  to  zero  or  infinity.  Barenblatt  [6]  considers  three  cases: 

1.  Type  1  metamodel. 

The  function  cp  tends  to  a  non-zero  fi  nite  I  i  mit  as  ITj  tends  to  zero  or  infinity.  In  practice  this 
means  cp  can  be  replaced  by  its  limiting  expression,  and  hence  f  will  be  a  product  of  power 
monomials  whose  values  can  bedetermined  by  dimensional  analysis. 

2.  Type 2 metamodel. 

The  fundi  on  cp  tends  to  the  power  law  asymptotic  expression: 


f 

cp  =  uy 

V 


Hi 

na/ 


rr; 


(10) 
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as  rij  tends  to  zero  or  infinity.  The  power  law  form  of  the  limiting  expression  still  leads  to 
complete  separation  of  variables,  but  with  characteristic  exponents  which,  in  contrast  to 
the  type  1  metamodels,  cannot  all  bedetermined  by  dimensional  analysis. 

3.  Type 3 metamodel. 

Power  type  asymptotic  behaviour  is  not  observed,  and  the  function  <p  has  no  finite  limit 
different  from  zero. 

2.2  Structure 

Many  physical  systems  conform  to  the  requirements  for  type  1  or  type  2  metamodel 
approximations.  Systems  that  produce  type  3  metamodels  will  not  be  considered  further. 
Previous  work  has  shown  that  combat  attrition  models  result  in  type  2  metamodels  [2]  [3] 
with  governing  equations  of  the  form: 

dr 

—  =  ~ac  y(t)d  x(t)e ,  x(0)  =  x0 

dt  (11) 

dy  =  -b<  x(t)  ey(t)h,  v(0)  =  y0 
dt 

For  the  cases  of  interest  here,  the  metamodel  can  be  written  as  a  product  of  power  law 
monomials,  where  some  exponents  may  be  unknown,  of  the  internal  parameters  used  to 
descri  be  the  system's  dynami  cs. 


3.  Including  the  Environment 

It  was  noted  above  that  metamodels  of  the  intermediate  asymptotic  form  use  the  system's 
internal  parameters  to  describe  its  dynamic  behaviour.  Many  systems  are  adequately 
described  using  just  their  internal  parameters,  without  consideration  of  any  interactions 
between  that  system  and  its  wider  environment.  This  is  a  standard  approach  based  on  the 
assumption  that  the  system  can  be  defined  in  such  a  way  that  changes  in  its  own  internal 
parameters  do  not  involve  external  interactions.  One  such  examplesystem  isthe  motion  of 
the  earth  and  its  moon.  For  most  purposes  only  the  properties  of  the  earth  and  moon  need 
to  be  considered.  The  gravitational  effect  of  all  other  masses  (its  environment)  can  be 
ignored. 

Flowever,  there  are  also  many  systems  where  the  interaction  between  the  system  and  its 
environment  cannot  be  neglected.  In  such  cases,  either  a  more  extensive  definition  of  the 
system  must  be  used  where  the  new  system's  dynamics  are  independent  of  its 
environment,  or  a  simple  mechanism  for  the  interaction  between  the  system  and 
environment  developed.  One  such  examplesystem  isthe  Firstlaw  of  Thermodynamics: 
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dU  =  dQ  +  dW 


(12) 


where  the  environment  is  treated  as  a  heat  source  or  sink  (dQ)  for  the  system's  internal 
energy  U . 

The  first  approach  is  commonly  used  in  combat  modelling,  which  has  been  reviewed  by 
Taylor  [4],  and  adopted  in  most  current  generation  military  combat  simulations  [7].  This 
has  the  drawback  that  these  system  definitions  are  generally  so  complex  as  to  require  their 
own  metamodelsto  provide  insight  and  so  do  not  simplify  the  problem. 

The  second  approach  has  also  been  used,  to  a  lesser  degree,  in  combat  modelling  where  it 
represents  losses  due  to  non-combat  processes  such  as  disease,  accident  or  reinforcements 
[8]  and  typically  produces  equations  of  theform: 


dx 

dt 

dy_ 

dt 


P  -ay- cx,  x(0)  =  x0 


Q-bx-dy,  v(0)  =  v0 


(13) 


More  complex  variants  of  Lanchester's  Equations  generally  produce  better  agreement 
between  theory  and  the  trends  exhibited  by  historical  data.  However,  the  effect  of  the 
environment  on  combat  outcomes  is  poorly  understood  [9], 

In  order  to  avoid  developing  complex  models  of  the  mechanisms  for  interaction  between 
the  system  and  its  environment,  which  are  often  not  understood  anyway,  a  simple  model 
of  the  effect  of  interaction  with  a  complex  environment  is  needed.  This  is  not  the 
oxymoron  that  it  first  appears  to  be.  Such  approaches  have  been  used  in  financial 
modelling  for  some  time  and  have  their  origin  in  the  study  of  Brownian  motion  as  a 
stochastic  process. 


4.  Stochastic  Processes 

This  section  follows  Hull's  approach  [10],  A  stochastic  process  is  described  by  a  variable 
whose  value  changes  in  time  in  an  uncertain  way.  Such  processes  can  be  discrete,  when 
the  variable's  value  can  only  change  at  specified  fixed  points,  or  continuous  when  the 
value  can  change  at  any  time.  Stochastic  processes  may  also  take  continuous  values,  when 
the  underlying  variable  can  take  any  value  within  a  specified  range,  or  discrete  values 
where  only  certain  specified  values  are  all  owed. 

A  Markov  process  is  a  particular  type  of  stochastic  process  where  only  the  current  valueof 
a  variable  is  relevant  for  predicting  its  future  evolution.  A  continuous  time,  discrete  value 
Markov  process  has  been  demonstrated  to  produce  a  stochastic  attrition  model  analogous 
to  Lanchester's  deterministic  equations.  Most  modern  combat  simulations  use  Markov 
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processes  to  describe  attrition.  The  stochastic  theory  of  attrition  has  been  comprehensively 
explored  by  a  number  of  workers  and  is  readily  accessible  [11]. 

A  basic  Wiener  process  is  a  sub-type  of  Markov  process  in  which  changes  to  the  value  of 
the  underlying  variable  during  successive  time  intervals  are  normally  distributed  with  a 
mean  of  zero  and  a  variance  of  "1  per  time  interval".  Brownian  motion  is  a  Wiener  process. 
A  variablez  follows  a  Wiener  process  if  it  has  the  foil  owing  two  properties: 

•  Property  1:  The  changed  during  a  small  time  period  St  is:  &  =  syfSt  where  sis  a 
random  number  from  a  standardised  normal  distribution  <\>()  with  a  mean  of  0  and 
a  variance  of  1. 

•  Property  2:  The  values  of  Sz  for  any  two  different  intervals  St  are  independent. 

A  generalised  Wiener  process  is  similar  to  the  basic  Wiener  process,  with  its  drift  rate  of  0 
and  variance  rate  of  1,  in  that  its  drift  rate  and  variance  rate  may  change  with  time.  Such  a 
process  for  a  variablex  has  a  stochastic  differential  of  the  form: 

dx  =  adt  +  bdz  (14) 

wherea  and  b  areconstants  and  z  is  a  basic  Wiener  process  above. 

Finally,  an  Ito  process  is  a  generalised  Wiener  process  in  which  the  parameters  a  and  b  are 
functions  of  the  underlying  variablex  and  timet. 

dx  =  a(x,t)dt  +  b(x,t)dz 
Sx  =  a(x,t)St  +  b(x,t)£'fSt 

with  a  drift  rate  of  a(x,t)  and  a  variance  rate  of  b(x,t)2.  The  stochastic  processes  above  have 
only  considered  systems  with  one  stochastic  variable.  There  is  not  an  inherent  limitation. 
Considering  a  system  with  two  Markov  variables  x  and  y,  it  is  straightforward  to  see  that 
Lanchester's  Equations  represent  a  special  case  of  an  Ito  process  with  two  coupled 
stochastic  variables  having  variance  rates  equal  to  0  and  drift  rates  defined  by  equation  1 
above. 

4.1  Black  Scholes  M  odel 

The  standard  model  for  the  behaviour  of  stock  prices  in  time  assumes  they  are  a 
continuous  Markov  process  with  a  constant  fractional  drift  rate.  Thus  the  stock  prices 
with  a  drift  rate  of  pi  is  described  by: 

—  =  pidt  (16) 

s 

which  is  a  system  with  one  internal  variables  and  one  internal  parameter  pi.  In  practice, 
stock  prices  exhibit  considerable  stochastic  volatility.  The  Black  and  Scholes  model  of  stock 
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prices  [12]  treats  the  volatility  as  resulting  from  the  action  of  the  rest  of  the  market  on  the 
system  composed  of  the  one  stock  price  being  modelled.  Furthermore,  it  does  not  attempt 
to  model  the  processes  by  which  the  market  might  affect  the  stock  price,  arguing  that  the 
mechanisms  are  too  complex  to  model  or  are  not  known.  However,  by  invoking  the  central 
limit  theorem  they  believe  that  the  effect  of  this  interaction  can  be  modelled  even  when  the 
mechanisms  are  not  known.  This  modifies  equation  16  to  an  Ito  process  as: 


dS_ 

S 


=  /jdt  +  adz 


(17) 


The  first  term  in  this  expression  results  from  the  interaction  between  the  system's 
underlying  parameters,  while  the  second  results  from  the  action  of  thewider  environment 
on  the  system.  The  analytic  solution  for  equation  16  is  a  single  exponential  distribution. 
Adding  the  Wiener  term  (equation  17)  modifies  this  solution  to  a  log-normal  distribution. 
The  similarity  between  this  equation  and  the  Lanchester  Equations  (equation  1)  is  clear. 
This  suggests  an  obvious  solution  to  the  question  of  how  to  include  the  effect  of  thewider 
environment  in  the  evaluation  of  combat  attrition  rates.  Following  the  Black  Scholes 
approach: 

dx 

—  =  -adt  +  adz2 ,  x(0)  =  x0 

y  (18) 

—  =  -bdt  +  sdzx ,  v(0)  =  v0 

x 


and  cr  and  s  are  measures  of  the  environment's  contribution,  while  the  z,  are  Wiener 
processes  which  may  be  independent. 

The  analytic  solution  to  Lanchester's  Equations  (equation  1)  are  monotonically  decreasing 
functions  with  exponential  components,  albeit  more  complex  than  for  the  stock  price 
model  above: 


x  =  Jr(-  Ae'  +  Be  /() 
y  =  (. Ae“  +  Be1') 


where: 


/  =  Jab,  A  =  — 

2 


To 


Jr, 


R=“,  B  =  i 
b  2 


f 


To  + 


V 


Jr. 


(19) 


(20) 


This  solution  for  both  x  and  y  reduces  to  a  single  exponential  when  the  initial  force  ratio 
( xo/yo )  equals  the  root  of  the  relative  effectiveness  (R ).  The  magnitude  of  A  is  a  measure  of 
how  evenly  the  forces  are  matched,  where  A  =  0  represents  neither  side  having  an 
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advantage  and  the  combat  is  even.  Provided  |A|  «  |  B|  the  contribution  from  the 
increasing  exponential  term  in  equation  19  will  be  small  and  can  be  treated  as  a 
perturbation.  The  leading  term  in  this  solution  is  then  also  a  single  exponential.  The 
solutions  can  be  considered  as  a  dynamical  system  with  a  positive  Maximal  Lyapunov 
Exponent  [13]  that  i  ncreases  with  I A I . 

Provided  |  A|  « I  B| ,  the  proposed  attrition  model  (equation  18)  also  has  a  log-normal 
distribution  as  its  first  order  solution  for  casualty  values.  Such  a  distribution  would  not  be 
expected  from  force  strength  distributions,  which  are  dominated  by  the  distribution  of 
initial  values  (xo,  and  yo).  These  are  controlled  by  different  dynamics  and  were  first 
investigated  by  Richardson  [14].  The  conditions  under  which  this  approximation  can  be 
expected  to  hold  will  be  examined  in  the  next  section,  along  with  a  number  of  issues 
concerning  the  use  of  historical  data  for  comparison  with  combat  models.  A  general  closed 
form  analytic  solution  for  equation  18  is  still  being  sought. 


5.  Historical  Analysis 

Determination  of  whether  Lanchester's  Equations  (equation  1)  need  to  be  augmented  by 
terms  describing  the  effect  of  the  wider  combat  environment,  as  proposed  in  equation  18, 
depends  on  whether  behaviour  that  can  be  attributed  to  the  effect  of  such  terms  can  be 
found  in  the  historical  record. 

5.1  Historical  Data 

Despite  the  large  number  of  recorded  battles  throughout  history,  the  number  with  usable 
data  is  small.  Any  compilation  of  battle  data,  being  a  subset  of  all  battles,  constitutes  a 
sample.  A  useful  database  will  have  a  sample  of  battle  data  that  is  representative  of 
patterns  observed  in  the  population  of  all  battles. 

To  validate  differential  models  of  attrition,  such  as  Lanchester's  equations,  force  and 
casualty  levels  for  both  sides  intermediate  to  the  starting  and  finishing  quantities  are 
required.  That  level  of  detail  is  rarely  availableand  often  does  not  exist.  The  author  has  not 
found  a  single  instance  where  sufficient  data  of  this  type  is  available  to  carry  out  the 
examination  proposed  in  section  4.  An  alternate  method,  using  only  initial  and  final  values 
of  engaged  force  strengths  is  developed  in  the  next  section. 

Some  data  compilations  of  battles  throughout  recorded  history  exist,  and  have  been 
aggregated  and  used  in  previous  analyses  [15].  The  component  databases  were  put 
together  by  a  number  of  workers  for  a  variety  of  different  purposes.  The  aggregate 
database  covers  a  wide  range  of  force  ratios  and  while  emphasising  20th  Century  battles, 
has  reasonable  coverage  back  to  1600.  It  emphasises  land  battles,  but  includes  one  air 
campaign.  Most  report  just  initial  and  final  values,  but  some  time  correlated  data  is 
included.  Inevitably,  there  are  issues  which  must  be  understood  when  attempting  to  reuse 
such  data  for  purposes  other  than  that  for  which  it  was  compiled.  M  any  of  these  have  been 
previously  studied  [15]  [16],  including: 
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•  potential  bias  in  the  battle  narrative  due  to  most  accounts  being  written  by  the 
victor  or  for  propaganda  purposes, 

•  many  reported  results  are  qualitative  or  approximate, 

•  many  reported  results  must  be  incorrect,  including  dispute  over  which  side  won, 

•  when  determining  force  strengths  should  support  or  service  personnel  beincluded, 

•  when  determining  casualties  should  prisoners  beincluded, 

•  how  should  force  strength  be  obtained  from  numbers  of  participating  staff,  should 
some  form  of  force  scori  ng  such  as  the  Quantified  Judgement  Method  be  used, 

•  how  should  the  effect  of  leadership,  initiative,  surprise,  terrain  and  weather  be 
included. 

While  some  of  the  authors  of  the  database  components  have  attempted  to  address  some  of 
these  issues  [16],  especially  questions  of  how  to  determine  force  strength  and  casualties, 
there  remains  a  question  regarding  the  accuracy  of  much  of  the  original  reporting, 
especially  for  battles  prior  to  the  19th  Century.  Nevertheless,  the  database  reviewed  here 
[15]  isthe  best  avail  able  source  to  carry  out  the  proposed  examination. 

Hartley  has  argued  that  the  individual  datasets  comprising  the  database  are  random 
samples,  because  they  were  independently  derived  [15].  This  argument  is  similar  to  the 
inverse  of  Bootstrap  sampling  [17],  which  has  been  used  to  improve  the  accuracy  of 
measures  of  sample  statistical  descriptors.  A  short  account  of  sampling  bias  is  given  in 
Appendix  A.  Aggregation  will  improve  the  accuracy  of  statistical  estimators,  however,  the 
outcome  assumed  by  H  artley,  that  while  the  database  is  not  a  true  random  sample  it  can 
be  treated  as  if  it  were  effectively  random,  requires  further  consideration.  It  is  difficult  to 
avoid  the  conclusion  that  the  database  is  little  more  than  an  aggregate  of  accidental  sample 
databases. 

The  individual  component  databases  are  the  product  of  the  recursive  application  of  the 
sub-sampling  process.  The  population  consists  of  all  battles.  This  is  first  sampled  to 
produce  the  set  of  all  recorded  battles.  Many,  especially  smaller  engagements,  are  never 
recorded.  The  requirement  that  both  the  initial  and  final  values  of  forces  strengths  are 
known  produces  another  sub-sampling  stage  to  generate  the  set  of  all  recorded  battles 
with  usable  data.  This  sampling  process  also  discriminates  against  smaller  battles.  Larger 
battles  receive  more  attention  and  hence  are  more  likely  to  have  their  attributes  recorded. 
The  individual  databases  are  themselves  samples  of  that  sample. 

Even  if  the  final  sampling  process  was  random,  the  process  of  recording  history  generates 
a  bias  towards  larger  battles.  By  stratifying  the  aggregate  database  according  to  battle  size, 
the  lowest  stratum  containing  the  smallest  battles  can  be  seen  to  be  the  most  affected. 
Together  with  the  improved  accuracy  from  aggregation,  strata  other  than  the  lowest  can 
be  treated  as  if  they  were  the  product  of  random  sampling.  This  conclusion  will  be  tested 
using  the  database. 

Previous  work  by  the  author  [18]  has  shown  that  the  initial  and  final  strengths  for  both 
si des  of  a  battl  e  are  rel ated  by: 
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(21) 


and  that  this  relationship  is  a  result  of  Lanchester's  model  for  attrition  (equation  1). 
H  elmbold,  and  later  H  artley  [15],  have  shown  that  the  data  from  an  ensemble  of  different 
battles  also  follows  equation  21. 


Lanchester's  Equations,  and  the  proposed  attrition  equations  (equation  18),  describe  the 
behaviour  of  a  single  system  in  time.  However,  the  historical  databases  contain 
information  about  an  ensemble  of  battles,  each  potentially  with  different  values  of  attrition 
coefficients  a  and  b.  Should  the  results  from  such  an  ensemble  follow  the  behaviour 
expected  of  a  si  ngl  e  system? 


Hartley  [15]  has  examined  this  issue  at  length.  Heconsidered  several  hypotheses,  rqecting 
all  save  the  conclusion  that  the  relationship  between  the  data  from  an  ensemble  of 
different  battles  was  a  direct  consequence  of  the  form  of  the  equations  governing  the 
attrition  process.  In  other  words,  the  behaviour  governing  an  individual  battle  was 
reflected  in  the  behaviour  of  an  ensemble  of  battles.  Indeed,  Helmbold's  earlier  work  on 
the  validation  of  Lanchester's  equations  using  historical  data  [16]  found  this  applies  to  a 
number  of  different  parameters  including  the  defender's  advantage. 


Using  Hartley's  conclusions,  albeit  more  empirical  than  rigorous,  and  the  random  nature 
of  the  data  samples  for  other  than  the  lowest  stratum,  it  is  expected  that  the  distribution  of 
casualties  during  a  battle  (equation  18)  should  also  be  reflected  in  the  distribution  of 
casual  ties  from  an  ensemble  of  battles.  This  conclusion  will  be  supported,  and  less  likely  to 
be  the  result  of  artefact  or  data  bias,  if  such  patterns  are  only  found  in  quantities  affected 
by  attrition  (equation  18)  and  not  in  other  quantities  such  as  initial  force  strengths. 


The  solutions  (equation  19)  to  Lanchester's  Equations  are  monotonically  decreasing  and 
can  be  approximated  as  single  exponential  functions  provided  |A|  «  I  B| .  This 
approximation  defines  the  region  of  validity  for  which  log-normal  solutions  to  equation  18 
can  be  expected.  It  corresponds  to  the  condition  that  the  two  sides  are  not  significantly 
mismatched  in  their  combat  potential.  Just  as  the  process  of  data  sampling  has  an  inherent 
bias  towards  larger  battles,  it  also  contains  a  bias  towards  more  "even"  battles.  An  attacker 
that  recognises  it  is  outmatched  has  the  option  to  useits  initiative  and  not  attack.  Similarly, 
a  defender  that  recognises  its  inferiority  usually  tries  to  improve  its  position  before 
accepting  battle.  Protagonists  appear  more  willing  to  accept  battle  when  they  are  evenly 
matched  and  have  a  better  expectation  of  a  favourable  outcome.  Battles  with  smaller 
values  for  I  A|  can  therefore  be  expected  to  be  over-represented  in  any  data  compilation.  It 
is  then  reasonable  to  expect  that  the  condition  I  A|  « I  B|  will  apply  to  more  data  in  the 
dataset,  consistent  with  the  requirement  for  battle  casualty  data  to  exhibit  a  log-normal 
distri  bution. 
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5.2  Results 

The  database  contains  a  number  of  duplicate  entries.  One  reason  for  this  was  to 
accommodate  the  different  reporting  of  particular  battles,  including  the  identity  of  the 
winning  side.  The  present  work  is  not  concerned  with  differentiating  between  winner  and 
loser,  which  necessitates  the  removal  of  duplicate  entries  from  thedatabase. 

Most  previous  work  on  frequency  distribution  of  force  sizes  and  casualties  has  only 
examined  total  combatants  or  casualties  [16].  The  present  work  examines  the  applicability 
of  equation  18  for  attrition  modelling  and  so  must  consider  each  side  separately.  Hence 
each  battlewill  contribute  two  data  points  to  the  analysis. 

5.2.1  Force  Size  Distribution 


The  frequency  distribution  of  initial  force  sizes  was  determined  by  dividing  the  range  of 
force  sizes  into  intervals  of  1000  and  counting  the  number  of  times  a  force  strength  from 
the  database  occurred  in  each  interval.  This  is  shown  on  a  logarithmic  scale  in  Figure  1. 


Figurel:  ForceSizeD  istribution,  and  regression  coefficient  of  determination 

This  is  observed  to  follow  an  inverse  power  law  relationship,  which  might  have  been 
expected  on  the  basis  of  previous  work  dating  back  to  Richardson  [14].  He  found  that  the 
frequency  distribution  of  casualties  in  wars  followed  an  inverse  power  law.  Such  fractal 
relationships  exhibit  scale  invariance.  Hence  it  is  not  unreasonable  to  find  similar 
relationships  for  other  similar  quantities,  such  as  the  force  si  zed  istribution  for  battles. 

It  should  also  be  noted  that  the  data  progressively  drops  below  the  line  of  the  linear 
relationship,  obtained  by  regression  analysis,  for  smaller  force  sizes.  This  is  the  behaviour 
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expected  by  the  inherent  data  bias  favouring  larger  battles  from  the  application  of 
samp  ling  techniques  described  above. 

The  database  contains  some  1500  force  size  entries.  Extrapolating  from  the  best  fit  line, 
excluding  those  entries  affected  by  bias,  indicates  that  an  unbiased  database  covering  the 
same  domain  should  be  expected  to  contain  at  least  1800  entries  if  small  size  battles  were 
not  underrepresented. 

It  should  also  be  noted  that  the  distribution  of  initial  force  sizes  does  not  exhibit  any 
indication  of  the  influence  of  a  normal  distribution.  The  completely  different  behaviour  of 
the  i  nitial  strength  frequency  and  the  casualty  frequency  supports  the  contention  that  such 
behaviour  results  from  the  attrition  process  and  is  not  an  artefact  of  the  sampling  or 
analysis  procedure. 

5.2.2  Casualty  Distribution 

The  distribution  of  the  natural  logarithm  of  each  side's  battle  casualties  was  determined  by 
dividing  the  range  of  observed  logarithm  of  casualty  values  into  intervals  of  size  1,  which 
is  equivalent  to  the  size  for  adjacent  intervals  having  a  ratio  of  1.65.  It  results  in  an  even 
spread  of  casualty  values  on  a  logarithmic  scale  which  is  necessary  for  the  accurate 
representation  of  its  distribution.  The  number  of  times  the  logarithm  of  the  casualty  value 
from  the  database  occurred  in  each  interval  was  then  counted.  This  is  shown  in  Figure  2. 

The  frequency  distribution  forms  a  bell  shaped  curve,  but  with  considerable  stochastic 
variability  across  the  peak.  This  limits  the  ability  to  determine  what  form  the  distribution 
takes,  in  particular  whether  it  is  consistent  with  a  normal  distribution.  Table  1  contains  the 
summary  statistics  for  the  logarithm  of  casualties  from  the  database. 

Table  1:  Log-Casualty  D atabase D  escriptive Statistics 


Database  Statistic 

Value 

Number  of  Entries 

1498 

Mean 

7.485 

Mode 

8.006 

Median 

7.474 

Standard  Deviation 

2.003 

Skewness 

0.105 

Kurtosis 

-0.153 

Minimum  Value 

0 

Maximum  Value 

13.693 

The  cumulative  casualty  distribution  was  formed  by  summing  the  number  of  occurrences 
with  casualties  greater  than  the  specified  value  and  is  also  shown  in  Figure  2.  The 
cumulative  distribution  for  occurrences  greater  than  the  specified  value  was  chosen  as  it 
confines  the  effect  of  data  bias  to  a  few  entries  at  the  lower  end  of  the  scale  instead  of 
incorporating  the  bias  in  all  the  data  points. 
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Table  1  indicates  that  the  data  is  slightly  skewed  towards  higher  casualties,  which  is  what 
is  expected  from  our  consideration  of  data  bias  issues.  The  previous  section  has  indicated 
that  the  database  should  contain  at  least  1800  entries,  if  there  was  no  bias  against  smaller 
battles.  This  bias  can  be  compensated  for  by  calculating  the  curve  that  the  distribution 
would  follow,  given  that  number  of  entries,  assuming  a  normal  distribution  and  using 
values  for  the  mean  and  standard  deviation  chosen  from  the  historical  data.  Given  the 
logarithmic  nature  of  the  scale,  the  additional  entries  can  be  assumed  not  to  change  the 
distribution  mean  by  much.  However,  they  will  influence  the  standard  deviation  more. 
Figure  2  also  shows  the  theoretical  cumulative  frequency  distribution  assuming  that  the 
logarithm  of  casualty  values  are  normally  distributed,  for  a  total  of  1800  entries,  a  mean  of 
7.5  and  a  standard  deviation  of  2.2. 


Figure  2:  Log-Casualty  Distribution,  Casualty  Cumulative  Distribution,  and  Theoretical 
Cumulative  Normal  Distribu  ti  on 

The  close  agreement  between  the  historical  data  and  the  theoretical  distribution,  except  for 
the  lowest  database  stratum  where  bias  is  expected  to  have  produced  under¬ 
representation,  is  apparent.  The  relationship  between  historical  data  and  theoretical  result 
can  be  observed  more  readily  by  looking  at  the  correlation  between  the  two  sets  of  values. 

Using  standard  statistical  techniques  [19],  the  correlation  coefficient  between  the  two  sets 
of  values,  ignoring  the  lowest  database  stratum,  was  determined  as  0.9965.  This  can  be 
also  be  seen  graphically  by  plotting  the  historical  cumulative  casualty  distribution  as  a 
function  of  the  theoretical  expectation,  as  seen  in  Figure  3.  Again  ignoring  the  lowest 
stratum,  regression  analysis  produces  the  best  fit  relationship  shown,  with  a  coefficient  of 
determination  of  0.9958. 
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Figure3:  Force  Size  Frequency,  and  regression  coefficient  of  determination 

One  standard  interpretation  [19]  of  the  coefficient  of  determination  is  that  99.58%  of  the 
variability  in  the  cumulative  log-casualty  frequency  distribution  observed  in  the  historical 
record  can  be  explained  by  the  variation  in  a  casualty  model  with  a  log-normal 
distribution,  such  as  that  proposed  in  equation  18. 

This  is  consistent  with  the  expectation  from  the  hypothesis  proposed  in  section  5.1. 


6.  Conclusions 

Hartley's  historical  battle  database  [15],  which  previously  has  been  used  to  validate  the 
fractal  nature  of  Lanchester's  attrition  equations  by  including  spatial  effects  [3],  has  been 
used  hereto  examine  whether  attrition  is  adequately  modelled  by  a  Markov  process.  It  has 
been  shown  that  the  frequency  distribution  of  battle  casualties  is  consistent  with  that 
produced  when  attrition  is  modelled  as  an  Ito  process.  The  additional  Wiener  term  can  be 
regarded  as  representing  theimpact  of  thewider  environment  on  attrition  rates. 

This  battle  database,  and  indeed  all  such  databases,  was  shown  to  includean  inherent  bias 
which  under-represents  the  number  of  small  battles.  While  the  effect  of  such  bias  was 
observed,  by  incorporating  strata  sampling  concepts  it  is  possibleto  confine  the  effects  of 
such  bias  into  a  single  stratum  and  a  small  number  of  data  points,  which  can  then  be 
allowed  for. 
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The  shape  of  the  distribution  resulting  from  the  assumption  of  a  controlling  Ito  process 
was  not  observed  in  the  initial  force  strength  distribution.  This  supports  the  contention 
that  such  distributions  result  from  the  attrition  process  itself  and  are  not  artefacts  of  the 
sampling  or  analysis  procedure. 

The  wider  implications  of  the  need  to  revise  Lanchester's  Equations  by  the  inclusion  of 
Wiener  terms  below  representing  the  impact  of  the  larger  environment  on  attrition  rates 
requires  further  investigation. 

dx 

—  =  -adt  +  adz  2 ,  x(0)  =  x0 

y  (22) 

—  =  ~bdt  +  edzi ,  v(0)  =  y0 

x 
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AppendixA:  Bias  in  Sampling  Theory 

Sampling  systems  are  used  to  obtain  estimates  of  properties  of  the  population  being 
studied,  and  will  be  judged  by  how  good  the  estimates  obtained  are.  A  good  sampling 
system  will,  on  occasions,  give  an  estimate  which  isfar  from  the  true  value,  just  as  a  poor 
system  may,  very  occasionally,  give  an  estimate  very  close  to  the  true  value.  A  system  is 
better  judged  by  the  frequency  distribution  of  the  many  estimates  which  are,  or  could  be, 
obtained  by  repeated  sampling.  A  good  system  would  give  a  frequency  distribution  with 
small  variance,  and  mean  estimate  about  the  same  as  the  true  value.  The  difference 
between  the  mean  esti  mate  and  the  true  val  ue  i  s  cal  I  ed  the  bi  as. 

Bias  may  arise  from  a  poor  method  of  analysis,  but  more  often  from  a  poor  choice  of 
samples,  or  from  the  method  whereby  the  measurement  or  counts  are  made  or  the  sampl  es 
are  obtai  ned.  If  the  size  of  sampl  e  i  ncreased,  or  the  data  of  two  or  more  sampl  es  combi  ned, 
then  the  bias  will  remain  unaltered,  but  the  variance  will  be  reduced.  Bias  can  normally 
only  be  detected  and  hence  eliminated  by  careful  examination  of  the  whole  sampling 
process  from  begi  nni  ng  to  end . 

The  basic  concept  in  all  sampling  is  the  random  sample.  A  sample  of  objects  from  a 
population  is  random  if  all  the  members  of  the  population  have  an  equal  chance  of 
appearing  in  the  sample.  It  is  very  important  to  remember  that  this  applies  to  all  members 
of  the  population,  exceptional  aswell  as  typical  members.  If  random  numbers,  orasimilar 
randomizing  process  are  not  used,  then  it  is  likely  that  all  individuals  in  the  population 
will  not  have  equal  chances  of  appearing  in  the  sample.  If  there  is  any  correlation  between 
the  quantity  being  measured  and  probability  of  appearing  in  the  sample,  the  result  may  be 
biased,  perhaps  strongly. 

When  sampling  a  heterogeneous  population  the  precision  achieved  can  be  increased  and 
the  risk  of  bias  reduced  by  dividing  the  population  into  sections,  each  relatively 
homogeneous,  and  sampling  each  section  (or  stratum)  separately.  Each  stratum  is  then 
sampled  independently,  and  estimates  obtained  for  each.  These  can  then  be  combined  to 
give  the  esti  mate  for  the  whole  population.  If  entire  groups  of  a  heterogeneous  population 
are  excluded  from  a  sample,  there  are  no  adjustments  that  can  produce  representative 
estimates  of  the  entire  population.  H  owever,  if  some  groups  are  underrepresented  and  the 
degree  of  under  representation  can  be  quantified,  then  sample  weights  can  correct  the  bias. 

When  the  population  being  sampled  is  extensive  or  complex,  the  practical  problems  in 
taking  a  simple  random  sample  are  great,  and  the  time  taken  for  even  a  small  sample  may 
be  large.  The  time  required  to  obtain  a  sample  of  a  given  size  may  be  greatly  reduced  by 
carrying  out  the  sampling  in  two  stages.  First  the  complete  population  may  be  divided 
into  a  number  of  distinct  primary  units  or  subpopulations,  and  from  these  a  sample  is 
taken.  From  each  of  these  sampled  sub-populations  a  secondary  sample,  or  subsample  of 
individuals  istaken. 

Weakest  of  all  sampling  procedures,  accidental  sampling  involves  using  what  is  available, 
and  most  convenient,  as  a  sample  pool. 
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